Count data

There are totally 5955 crime events updating to September, 2021.

Violation events

Of these crime events, 775 were judged as violation crime events.

Misdemeanor events

Of these crime events, 3169 were judged as misdemeanor crime events.

Felony events

Of these crime events, 2040 were judged as felony crime events.

Crime events v.s. Time

Number of cirime events each month

Basically, the number of crime events in each month has the comparably same value. Specifically, March has the highest events number, whereas February has the lowest events number; there is a overall increasing number in the number of crime events along with months.

sub_crime_month = 
  raw_sub_crime %>% 
  filter(start_date > "2021-01-01") %>% 
  select(start_date, start_time, crime_event, law_cat) %>% 
  mutate(start_date = substring(start_date, 1, 7)) 
  
plot_1 = 
  sub_crime_month %>% 
  group_by(start_date) %>% 
  summarise(event_num = n()) %>% 
  plot_ly(
    x = ~start_date, y = ~event_num, type = "bar"
  )

layout(plot_1, title = "Crime events over month", xaxis = list(title = "Month"), yaxis = list(title = "Number of Crime Events"))

Number of crime events each week

With the degree to weeks, the highest number of crime events appears in the eleventh week, which in the March. It appears that there would be a locally maximum point for around every ten weeks, and there also would be a sudden increase and decrease at the beginning and end of the year, accordingly.

sub_crime_week = 
  raw_sub_crime %>% 
  select(start_date, start_time, crime_event, law_cat) %>% 
  mutate(week = cut.Date(start_date, breaks = "1 week", labels = FALSE)) %>% 
  arrange(week) %>% 
  group_by(week) %>% 
  summarise(event_num = n())
  
plot_2 = 
  sub_crime_week %>% 
    plot_ly(
    x = ~week, y = ~event_num, type = "scatter", mode = "marker"
  )

layout(plot_2, title = "Crime events over weeks", xaxis = list(title = "Week"), yaxis = list(title = "Number of Crime Events"))

Top 5 crime events v.s. Occurrence time

For the most frequent crime events, they mainly happen in the afternoon, from 12:00 pm to 20:00 pm in the 9 months in 2021. Of these crime events, from 12:00 pm to 20:00 pm, crime mischief is the most frequent happening events. In New York City, criminal mischief includes intentionally damage, participation in the destruction of an abandoned building.

The third degree of assault is the second frequent crime events in that time interval, which includes intention to cause physical injury to another person.

In that time interval, of crime events, the second degree of harassment is the third frequently happening events, which includes the intention to harass, annoy or alarm some person and strike people in some manner or make physical contact with them.

crime_occ_time = 
  raw_sub_crime %>% 
  mutate(event_time = ordered(event_time, levels = c("2 AM","6 AM","10 AM","2 PM","6 PM","10 PM"))) %>% 
  filter(crime_event %in% c("criminal mischief & related of","assault 3 & related offenses","harrassment 2","grand larceny","dangerous drugs"))

plot_3 = 
  crime_occ_time %>% 
  ggplot(aes(x = event_time, fill = crime_event)) + 
  geom_histogram(stat = "count", width = 0.9, height = 2) + 
  labs(
    title = "Frequency of crime events v.s. Time points", 
    x = "Occurrence time", 
    y = "Frequency of crime events") + 
  theme_bw() + 
  theme(
    plot.title = element_text(hjust = 1), 
    legend.position = "bottom",
    legend.text = element_text(size = 8)) + 
  guides(col = guide_legend(nrow = 2))

ggplotly(plot_3) %>%
  layout(legend = list(
      orientation = "h",
      xanchor = "center",
      yanchor = "top",
      x = 0.3,
      y = - 0.3
    )
  )

Degrees of crime event

Among the top 10 frequent crime events, events with degree violation mainly occurs in the second degree of harassment, and there is basically no other degree events happened in this type of crimes; events with misdemeanor degree mainly happens in the third degree of assault and criminal mischief; and crime events with felony degree almost happen in each crime event among the top 10 most frequent crime events.

sub_crime_degree = 
  raw_sub_crime %>% 
  filter(crime_event %in% c("criminal mischief & related of","assault 3 & related offenses","harrassment 2","grand larceny","dangerous drugs","felony assault", "robbery", "petit larceny", "forgery", "sex crimes")) %>% 
  count(crime_event, law_cat)

plot_4 = 
  sub_crime_degree %>% 
    plot_ly(
    x = ~crime_event, y = ~n, color = ~law_cat, type = "bar"
  )

layout(plot_4, title = "Crime Events Numbers each degree", xaxis = list(title = "Crime events"), yaxis = list(title = "Number of Crime Events"))

Proceeding time

For the proceeding time, the overall relationship between proceeding time and degrees of crime events if that the median proceeding time depcreases along with the degree of events increase. Excluding some extreme outliers, felony crime events have the largest median proceeding time so the difficulty of dealing with this type of events for official workers may increase the proceeding time; it also has the largest range of proceeding time, it may depend on the specific cases of this type of events;

For the violation degree of crime events, it has the lowest median proceeding time, so it may be much easier to be delt with for official workers.

crime_prcd_time = 
  raw_sub_crime %>% 
  drop_na(start_time, end_time) %>%
  mutate(prcd_time = difftime(end, start, units = "mins")) %>% 
  filter(prcd_time < 35) %>% 
  filter(prcd_time != 0) %>% 
  mutate(quarters = quarters(as.Date(start_date)))

plot_5 = 
  crime_prcd_time %>% 
  plot_ly(y = ~ prcd_time, color = ~ law_cat, type = "box")

layout(plot_5, title = "Crime type", xaxis = list(title = "Proceeding time"), yaxis = list(title = "Crime type v.s. Proceeding time (mins)")
    )

Day of week v.s. Occurrence time

From the heat map, it is easier to be judged that most of crime events were mainly occurred in the afternoon from Tuesday to Thursday. We can say that in the noon around 12:00 pm on Tuesday and Wednesday may be the most dangerous time on in a week, whereas there are not that many crime events on Saturday and Sunday.

sub_crime_dow = 
  raw_sub_crime %>% 
  mutate(day_of_week = wday(as.Date(start_date), label=TRUE, abbr = FALSE)) %>% 
  mutate(day_of_week = fct_relevel(day_of_week, "Saturday", "Friday", "Thursday", "Wednesday", "Tuesday", "Monday", "Sunday")) %>% 
  separate(start_time, into = c("hour", "minute", "second"), sep = ":") %>% 
  select(day_of_week, hour, crime_event) %>% 
  group_by(day_of_week, hour) %>% 
  summarise(crime_num = n())

plot_6 = 
  sub_crime_dow %>% 
  plot_ly(
    x = ~ hour, y = ~ day_of_week, z = ~ crime_num, type = "heatmap", colors = "BuPu"
  ) %>%
  colorbar(title = "Events Number", x = 1.1, y = 0.8) 

layout(plot_6, title = "Crime frequency: Day v.s. Hour", xaxis = list(title = "Hour"), yaxis = list(title = "Day of week")
    )